N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

نویسندگان

  • Frederick Eberhardt
  • Clark Glymour
  • Richard Scheines
چکیده

By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N 1 experiments suffice to determine the causal relations among N>2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultaneously randomize more than one variable. This bound provides a type of ideal for the measure of success of heuristic approaches in active learning methods of causal discovery, which currently use less informative measures. Three Methods and Their Limitations Consider situations in which the aim of inquiry is to determine the causal structure of a kind of system with many variables, for example the gene regulation network of a species in a particular environment. The aim in other words is to determine for each pair X, Y of variables in a set of variables, S, whether X directly causes Y (or vice-versa), with respect to the remaining variables in S, i.e., for some assignment of values V to all the remaining variables in S, if we were to intervene to hold those variables fixed at values V while randomizing X, Y would covary with X, or vice versa. Such a system of causal relations can be represented by a directed graph, in which the variables are nodes or vertices of the graph, and X → Y indicates that X is a direct cause of Y. If there are no feedback relations among the variables, the graph is acyclic. We are concerned with the most efficient way to determine the complete structure of such a directed acyclic graph, under some simplifying assumptions. Suppose that, before collecting data, nothing is known that will provide positive or negative evidence about the influence of any of the variables on any of the others. There are several ways to obtain data and to make inferences: 1 Second affiliation: Florida Institute for Human and Machine Cognition 1. Conduct a study in which all variables are passively observed, and use the inferred associations or correlations among the variables to learn as much as possible about the causal relations among the variables. 2. Conduct an experiment in which one variable is assigned values randomly (randomized) and use the inferred associations or correlations among the variables to learn as much as possible about the causal relations. 3. Do (2) while intervening to hold some other variable or variables constant. Procedure 1. is characteristic of non-experimental social science, and it has also been proposed and pursued for discovering the structure of gene regulation networks (Spirtes, et. al, 2001). Consistent algorithms for causal inferences from such data have been developed in computer science over the last 15 years Under weak assumptions about the data generating process, specifically the Causal Markov Assumption, which says that the direct causes of a variable screen it off from variables that are not its effects, and the Faithfulness Assumption, which says that all of the conditional independence relations are consequences of the Causal Markov Assumption applied to the directed graph representing the causal relations. Consistent search algorithms are available based on conditional independence facts the PC-Algorithm, for example (Spirtes, et al., 2000) and other consistent procedures are available based on assignments of prior probabilities and computation of posterior probabilities from the data (Meek, 1996; Chickering, 2002). We will appeal to facts about such procedures in what follows, but the details of the algorithms need not concern us. There are, however, strong limitations on what can be learned from data that satisfy these assumptions, even supplemented with other, ideal simplifications. Thus suppose we have available the true joint probability distribution on the variables, and there are no unrecorded common causes of the variables (we say the variable set is causally sufficient), and there are no feedback relations among the variables. Under these assumptions, the algorithms can determine from the observed associations whether it is true that X and Y are adjacent, i.e., whether X directly causes Y or Y directly causes X, for all variables X, Y, but only in certain cases can the direction of causation be determined. For example, if the true structure is

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables

We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N ≥ 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K < 1 2N we provide an upper bound on the n...

متن کامل

Causal Relations Among N Variables

By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N − 1 experiments suffice to determine the causal relations among N > 2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simul...

متن کامل

Suffice to Determine the Causal Relations Among N Variables

By combining experimental interventions with search procedures for graphical causal models we show that under familiar assumptions, with perfect data, N 1 experiments suffice to determine the causal relations among N>2 variables when each experiment randomizes at most one variable. We show the same bound holds for adaptive learners, but does not hold for N > 4 when each experiment can simultane...

متن کامل

A Tight Upper Bound on the Number of Variables for Average-Case k-Clique on Ordered Graphs

A first-order sentence φ defines k-clique in the average-case if limn→∞ PrG=G(n,p) [ G |= φ ⇔ G has a k-clique ] = 1 where G = G(n, p) is the Erdős-Rényi random graph with p = p(n) being the exact threshold such that Pr[G(n, p) has a k-clique] = 1/2. A question of interest is: How many variables are required to define average-case k-clique in first-order logic? Beyond just the usual language of...

متن کامل

Causal Ordering in a Mixed Structure

This paper describes a computational approach, based on the theory of causal ordering, for inferring causality from an acausal, formal description of a phenomena . Causal ordering is an asymmetric relation among the variables in a self-contained equilibrium and dynamic structure, which seems to reflect people's intuitive notion of causal dependency relations among variables in a system . This p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005